Convergence to stationary point

For any β-smooth differentiable function ff (convex or not), if we run GD for TT steps, we can find a point 𝐱̂\hat{\mathbf{x}} such that: ||f(𝐱̂)||222βT(f(𝐱(0))f(𝐱*))||\nabla f(\hat{\mathbf{x}})||_2^2 \leq \frac{2\beta}{T}\left(f(\mathbf{x}^{(0)})-f(\mathbf{x}^*)\right)

Corollary

If T2βTT \geq \frac{2\beta}{T}, then ||f(𝐱̂)||22ϵ(f(𝐱(0))f(𝐱*))||\nabla f(\hat{\mathbf{x}})||_2^2 \leq \epsilon \left(f(\mathbf{x}^{(0)})-f(\mathbf{x}^*)\right).


see: Stationary point